The safety and effectiveness of music medicine as an intervention for depression: A systematic evaluation and re‐evaluation

Abstract Background As the methodological quality and evidence level of the existing systematic reviews (SRs) on music as an intervention for depression have not been thoroughly evaluated, a systematic evaluation and re‐evaluation (SERE) was conducted. Methods Multiple databases including PubMed, Web of Science, Embase, China National Knowledge Infrastructure, SinoMed, Wanfang, and the VIP database were searched for SRs and meta‐analyses (MAs) on the effectiveness of music as an intervention for depression. The literature screening, evaluation of methodological quality, and assessment of evidence level were carried out by a team of researchers. The methodological quality was evaluated using the Assessment of Multiple Systematic Reviews 2 (AMSTAR 2) scale in accordance with the 2020 Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) guidelines, and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) criteria were utilized to assess the level of evidence. Results A total of 18 SRs were included in the analysis. The 2020 PRISMA guidelines were utilized to evaluate various aspects such as search terms, funding sources, statistical methods for missing values, subgroup and sensitivity analyses, certainty assessment, excluded literature citations, assessment of publication bias, protocol information, conflicts of interest, and data availability, which were rarely reported. The evaluation of the studies using the AMSTAR 2 scale revealed that one article was rated as high quality, six were rated as low quality, and 11 were rated as very low quality. Based on the GRADE criteria evaluation, the quality of the evidence was found to be inconsistent, with reports primarily consisting of medium‐quality evidence. Conclusion The methodological quality of SRs/MAs of music as an intervention in depression is generally poor, and the level of evidence is generally low.


INTRODUCTION
Depression, also known as depressive disorder, is characterized by core symptoms such as low mood and loss of interest that are not proportional to the situation (Monroe & Harkness, 2022).Additional symptoms may include anxiety, agitation, hallucinations, and delusions (Monroe & Harkness, 2022).This mental health condition is highly prevalent and carries a high risk of mortality and disability (Benasi et al., 2021).Globally, 5% of adults experience depression annually, resulting in significant social and economic losses for individuals, families, communities, and countries (Herrman et al., 2022).Before the COVID-19 pandemic, depression-related economic losses were estimated to be approximately $1 trillion per year (Herrman et al., 2022).
Although the etiology and pathogenesis of depression are not fully understood, research suggests that they may involve central nervous inflammation, intestinal microecological destruction, neurotransmitter abnormalities, and hypothalamic-pituitary-adrenal axis disorders (Hao et al., 2022;He et al., 2022).
At present, depression is treated through two main approaches: drug therapy and nondrug therapy.Western and traditional Chinese medicine are both effective in treating depression through drug therapy.However, antidepressants used in Western medicine may have problems such as poor efficacy, high recurrence rate, and strong adverse reactions (Fournier et al., 2010;Thase & Denko, 2008).The therapeutic effects and mechanisms of antidepressants used in Chinese medicine are unclear (Hao et al., 2022).Nondrug therapies such as physical therapy and psychotherapy are also commonly used.
Among these, nonconvulsive electroconvulsive therapy is the most widely recognized nondrug treatment, which can be combined with antidepressant drug therapy for additional benefits.However, nonconvulsive electroconvulsive therapy also has problems such as uncertain clinical efficacy and adverse reactions (G.P. Zhong, 2020).Psychotherapy includes cognitive therapy, psychological support therapy, and mindfulness decompression therapy (Hu et al., 2022;Pampallona et al., 2004), with cognitive therapy being the most common type of psychotherapy (Hu et al., 2022).The use of cognitive therapy is limited due to its long treatment duration and lack of coverage by medical insurance companies in China, which leads to a high cost for patients (Ma et al., 2021).
Music has been widely used in research and clinical applications in Western countries due to its long history (Llovet, 2017).Music has the ability to induce positive emotions and relax the body and mind, and its antidepressant effect may be mediated by its impact on serotonin transmission and hippocampal brain-derived neurotrophic factor levels in the central nervous system (Lin et al., 2011).Previous studies have confirmed the positive therapeutic effects of music on depression (Feneberg et al., 2021;Fu et al., 2023;Wall et al., 2023; X. Wang et al., 2023;Xue et al., 2023).Music-based interventions can be broadly categorized into two types: music therapy and music medicine.Music therapy is a systematic intervention process facilitated by a certified music therapist who, in an evidence-based manner, utilizes specially designed musical forms and therapeutic relationships formed during the process to assist the recipients in achieving mental and physical health goals (Chen & Gao, 2022).Music medicine involves interventions primarily focused on music listening provided by nonqualified individuals (Aalbers et al., 2017).Consequently, the standards for music medicine are broader than those for music therapy, making it more accessible in clinical treatments and everyday life.Moreover, music medicine is characterized by its simplicity, low cost, minimal resource requirements, and greater acceptance among individuals with depression (Werner et al., 2017).
Music medicine intervention for depression is not a novel concept.
In fact, extensive clinical research has been conducted worldwide.The number of systematic reviews/meta-analyses (SRs/MAs) based on the results of these clinical studies is also considerable.SRs/MAs, as a crucial research method, serve as the cornerstone for evaluating clinical effectiveness, formulating clinical guidelines, and standardization.
It is also a significant source of evidence in evidence-based medicine (Li & Li, 2008).Simultaneously, low-quality SRs/MAs can mislead clinical decisions.Systematic evaluation and re-evaluation (SERE), also known as Umbrella Review, stands at the top of the evidence-based medicine pyramid (Huang et al., 2023;D. Y. Zhong et al., 2022).SERE, based on the results of SRs/MAs, rigorously integrates and evaluates the evidence information reported in previous SRs/MAs with caution.Its evaluation results can provide valuable references for clinical decision-making and offer strong guidance.SERE is equally applicable to the field of music medicine intervention in depression.The quality, methodology, and evidence of previously published SRs/MAs reports on music medicine intervention for depression have not been systematically re-evaluated.This study aims to re-evaluate SRs/MAs on music medicine intervention in depression, providing more comprehensive evidence in evidence-based medicine for clinical decisions on music medicine intervention in depression.

Inclusion criteria
a.The subjects of the study are patients with depression.The type of depression is not restricted, and both primary and secondary depression can be included.The treatment group focuses on music-based interventions, which may be combined with or without antidepressant medication.Music therapy is not included in this.
b.The control group consists of conventional treatment groups, which may be combined with or without antidepressant medication.However, the study should disclose that there is no significant difference between the combined treatment in the experimental group and the treatment in the control group.f.The language of the study is limited to Chinese and English, excluding other languages.
g.The publication date of the SRs/MAs should be on or before November 12, 2022.

Exclusion criteria
a. Exclude articles that do not meet the requirements of the study c.The population observed in this study consists of patients with depression, whether primary or secondary depression is included.
Articles that do not include patients with depression or exclude depression will be excluded.
d.In this study, we have predesigned some observational indicators.
Articles that do not include these indicators will also be excluded.
e.In addition to the four requirements mentioned above, articles with completely duplicated content or articles that cannot be accessed in full will also be excluded.

Data extraction and quality evaluation
Two authors retrieved and filtered the articles, and then extracted and reviewed the data.Disagreements were resolved through negotiation with a third author.If any data were missing, corresponding authors were contacted for additional information.Articles that remained incomplete were excluded.The methodological and level-of-evidence quality of the included papers were assessed by four authors using the 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, the Assessment of Multiple Systematic Reviews 2 (AMSTAR 2), and the Grading of Recommendations, Assessment, Development, and Evaluation (GRADE) (Guyatt et al., 2008;Page et al., 2021;Shea et al., 2017).Following the evaluation, another author checked the data.

Search results and basic characteristics of the included studies
After a search of seven databases, 466 articles were obtained for review.Overall, 35 articles were retrieved from CNKI, 19 from Wanfang, 9 from VIP, 22 from SinoMed, 88 from PubMed, 129 from Web of Science, 163 from Embase, and 1 from another source.Using the E-study software of CNKI, 182 duplicate articles were excluded.
Detailed evaluation results are shown in Table 2.

Evidence evaluation of GRADE criteria
Eight SRs evaluated depressive symptom indicators (Q.Tang et al., 2020;Tsai et al., 2014;X. N. Wang & Lin, 2017; Y. X. Wang et al., 2020;Yang et al., 2019;Zhang et al., 2022;Zhao et al., 2016;Zhu et al., 2021) using nine measures.GRADE evaluation results showed that one study was rated as high quality, six were rated as medium quality, and one was rated as extremely low quality.Five SRs evaluated the treatment efficiency (X.F. Dai et al., 2015;Dayuan et al., 2022;Yan et al., 2019;Yu et al., 2020;Zou et al., 2017).GRADE evaluation results showed that five SRs were rated as medium quality.Seven SRs evaluated the Hamilton depression scale index (X.F. Dai et al., 2015;Dayuan et al., 2022;Fan et al., 2015;Wan et al., 2018;Yan et al., 2019;Yu et al., 2020;Zou et al., 2017).The GRADE evaluation results showed that one SR was rated as high quality, five SRs were rated as medium quality, and one SR was rated as very low quality.Six SRs evaluated the Self-rating depression scale (SDS) (Y.Q. Abbreviations: CCT, case-control study; COPD, Chronic obstructive pulmonary disease; CRT, Cochrane risk-of-bias tool; HDRS/Ham-D, Hamilton Depression Scale; Jadad means Jadad score; N-RCT, non-randomized controlled trial; PSD, Post-stroke depression; RCT, randomized controlled trial. 2015; Ji et al., 2015;Liu & Ji, 2021;Yu et al., 2020).GRADE evaluation results showed that all six SRs were rated as medium quality.Detailed GRADE evaluation results are shown in Table 4.

The primary findings of this study
SRs are widely recognized as the best evidence synthesis studies in clinical decision-making (Cook et al., 1997).Therefore, it is highly necessary to conduct SERE in order to better utilize SR as a tool and provide superior evidence for clinical practice.In this study, a total of 18 SRs on music as an intervention for depression were included.The quality and evidence level of these reviews were assessed using 2020 PRISMA guidelines, AMSTAR 2, and GRADE.

The methodological quality of SRs on music interventions for depression needs improvement
This study evaluated the methodological quality of the included literature using PRISMA and AMSTAR 2. The results showed that among the 18 included articles, both PRISMA and AMSTAR 2 identified numerous missing items, indicating poor reporting quality of these SRs.In the methods section, PRISMA scoring primarily focused on the literature search process, literature screening process, and description of statistical methods.We found that only two studies detailed the TA B L E 2 Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) quality evaluation of included studies.

Partial conditions met
No conditions met

Number of included studies
Proportion(%)

Proportion(%) Number of included studies Proportion(%)
Reporting biases Present assessments of risk of bias due to missing results (arising from reporting biases) for each synthesis assessed.
18 (Tsai et al., 2014;Fan et al., 2015;X. F. Dai et al., 2015;Ji et al., 2015 any other materials used in the review. 3 (Yang et al., 2019;Q. Tang et al., 2020;Dayuan et al., 2022) 16.67 0 0.00 15 (Tsai et al., 2014;Fan et al., 2015;X. F. Dai et al., 2015;Ji et al., 2015;Y. Q. Dai et al., 2016;Zhao et al., 2016;X. N. Wang & Lin, 2017;Zou et al., 2017;Wan et al., 2018;Yan et al., 2019;Y. X. Wang et al., 2020;Yu et al., 2020;Liu & Ji, 2021;Zhu et al., 2021;Zhang et al., 2022) 83.33 TA B L E 3 Methodological quality evaluation results of the included studies.Note: Item 1 represents whether the question and criteria included elements of the population, intervention, comparison, outcome (PICO), and item 2 represents a systematic evaluation of whether or not the evaluation was designed in advance, and whether there are significant differences between the content of the report and the proposed program.Item 3 covers explanation of the choice of study design type, and item 4 indicates whether a comprehensive manuscript retrieval strategy has been used partially in accordance with partial conformity.Item 5 is what is the repeatability of the study screening, Item 6 represents the repeatability of data extraction.Item 7 is for the purpose of listing and proving that the exclusion causes partially comply with partially comply, Item 8 describes in detail the contents of the included study.Item 9 is partially consistent with the use of appropriate methods to assess bias between natal studies.Item 10 provides information on the sources of funding for the included study.Item 11 indicates the suitability of the method for combining results, and item 12 assesses the potential impact of the bias of the natal study on meta-analysis results and other evidence synthesis: In interpreting and discussing the results of the systematic evaluation, is there an understanding of the bias of the included study.Item 14 represents a reasonable approach that explains or discusses the heterogeneity observed in the evaluation results.Item 15 represents a quantitative merger with full investigation of publication bias and discussion of its possible impact on the evaluation results.Item 16 represents any potential conflict of interest reported, including any funds received for systematic evaluation.Y, yes; N, no; PC, partially consistent.

TA B L E 4
Evidence evaluation results of grading of recommendations, assessment, development, and evaluation (GRADE) criteria of total effective rate.search strategies for each database (Wan et al., 2018;Yang et al., 2019), while most studies merely described the search terms, with some listing search strategies for individual databases.Although all 18 articles detailed the processes of literature screening and data extraction, only three articles detailed the names of the researchers involved in these operations (Y.Q. Dai et al., 2016;Q. Tang et al., 2020;Zhu et al., 2021).Few SRs provided detailed explanations regarding handling of missing data, subgroup analyses, sensitivity analyses, and grading of evidence.In the appendices, only two SRs detailed protocol mentions (Dayuan et al., 2022;Yang et al., 2019).Additionally, Chinese SRs rarely described conflict of interest and data accessibility statements.Results from AMSTAR 2 similarly showed a bias in the reporting quality of the 18 SRs.One SR was rated as high quality (Dayuan et al., 2022), six as low quality (Q.Tang et al., 2020;Tsai et al., 2014;Yan et al., 2019;Yu et al., 2020;Zhang et al., 2022;Zhao et al., 2016), and 11 as very low quality (X.F. Dai et al., 2015;Y. Q. Dai et al., 2016;Fan et al., 2015;Ji et al., 2015;Liu & Ji, 2021;Wan et al., 2018;X. N. Wang & Lin, 2017;Y. X. Wang et al., 2020;Yang et al., 2019;Zhu et al., 2021;Zou et al., 2017).The deductions primarily concentrated on items 2, 14, 15, and 16.Among the 18 articles included in this study, only two reported registration of study protocols (Dayuan et al., 2022;Yang et al., 2019), while the remaining 16 did not mention whether study protocols were registered in advance (X.F. Dai et al., 2015;Y. Q. Dai et al., 2016;Fan et al., 2015;Ji et al., 2015;Liu & Ji, 2021;Q. Tang et al., 2020;Tsai et al., 2014;Wan et al., 2018;X. N. Wang & Lin, 2017; Y. X. Wang et al., 2020;Yan et al., 2019;Yu et al., 2020;Zhang et al., 2022;Zhao et al., 2016;Zhu et al., 2021;Zou et al., 2017).Few studies extensively analyzed heterogeneity and publication bias on the conclusions of SRs.Most studies did not detail the funding sources and potential conflicts of interest for their included articles.These results are consistent with those of PRISMA, indicating flaws in the design process of these SRs.

The evidence strength of SRs on music interventions for depression is low
The GRADE tool was employed to grade the evidence for each outcome measure of the SRs, evaluating the credibility and reliability of this evidence.The results indicate that for the assessment of four outcome measures, the majority of SRs were rated as moderate and low quality.The primary reasons for downgrading were study limitations and publication bias.Study limitations were the most frequent downgrade factor, manifested in poor methodological quality of most primary studies, such as lack of blinding, inadequate randomization, and insufficient allocation concealment.Despite the increasing number of clinical studies related to music in recent years, most are small-scale, low-quality, repetitive studies with a lack of publication of negative results, high heterogeneity among studies, and a shortage of high-quality evidence from large-scale multicenter studies.

Evidence of the Efficacy of Music Interventions for Depression
The Beck Depression Inventory is specifically designed to assess the severity of depression (Stepankova Georgi et al., 2019).Developed by the renowned American psychologist Beck AT in the 1960s, it has since been widely utilized in clinical epidemiological surveys.The hamilton depression scale (HAMD) is the most widely used depression assessment scale in clinical practice, employed to evaluate the severity of depression in patients.It is a highly common scale for depression assessment (Berko et al., 2022;Carrozzino et al., 2020;D. Zhong et al., 2023).The SDS consists of 20 items with a 4-level scoring system, originally developed by W.K. Zung in 1965 (Jokelainen et al., 2019).Its characteristics include ease of use and the ability to intuitively reflect the subjective feelings of depressed patients and changes during treatment.The efficacy of treatment is evaluated from the perspective of the ratio of effectively treated patients.These four indicators have strong guiding significance for the evaluation of depression.This study included 18 SRs, among which eight SRs evaluated depressive symptoms (Q.Tang et al., 2020;Tsai et al., 2014;X. N. Wang & Lin, 2017;Y. X. Wang et al., 2020;Yang et al., 2019;Zhang et al., 2022;Zhao et al., 2016;Zhu et al., 2021).The publication dates of these eight SRs ranged from 2014 to 2022, and each SR included clinical studies that were not entirely overlapping.However, the results of these eight SRs were completely consistent, with all conclusions pointing to the effectiveness of music interventions for depression.Five SRs reported efficacy rates (X.F. Dai et al., 2015;Dayuan et al., 2022;Yan et al., 2019;Yu et al., 2020;Zou et al., 2017).The results showed that the findings of these five SRs were essentially consistent, and the differences were all statistically significant.These results were also evident in the evaluations using HAMD and SDS.Considering the results of these 18 SRs collectively, the effectiveness of music interventions for depression is essentially demonstrated.However, regarding how music can further improve the efficacy of depression treatment, there are currently no more detailed SRs available for reference.Although Zhong investigated differences in improvement of depression based on factors such as duration, frequency, duration of treatment, and volume of music listening, no definitive conclusions were drawn (Dayuan et al., 2022).

IMPLICATIONS FOR FUTURE RESEARCH
The SRs of music as an intervention for depression published so far have poor methodological quality and low levels of evidence.Additional evaluation is required to determine the efficacy of music as an intervention for depression.More rigorous methodology and comprehensive evaluation of the level of evidence are needed.Future researchers could adopt strict evaluation criteria, such as those described by the Cochrane system, to provide high-quality, evidencebased information that clinicians can use to assess the effectiveness of music as an intervention for depression.

LIMITATIONS
This study has certain limitations.First, the databases available to us are limited.Among the databases searched in this study, PubMed, Web of Science, CNKI, Wanfang, and VIP are the most commonly used databases accessible to us.Other social science databases, such as PsychInfo, Social Service Abstracts, and SocIndex, cannot be accessed due to regional and policy limitations.Although the majority of Chinese social science papers are included in CNKI, Wanfang, and VIP databases, there are still some papers that are not included.Therefore, there may be some papers meeting the inclusion criteria that were overlooked.
Additionally, the music interventions varied across different articles.Some studies investigated receptive music, while others studied recreative music or improvisational music.Even within the category of listening to music, the type of music varied; some studies involved listening to light music, while others focused on classical music.These variations in music approaches contribute to the heterogeneity across studies, and we cannot assess the extent to which these different music modalities may influence the results.
Furthermore, depression was not consistent across studies in this research.Some studies focused on depression in the elderly, while others studied depression in adolescents.Patients also had different comorbidities, with some studies focusing on depression in cancer patients, cardiovascular disease patients, stroke patients, postpartum depression, or patients with chronic obstructive pulmonary disease.
The severity of depression also varied, with some studies examining depressive mood while others focused on more severe depression.
These different levels of depression further increase the heterogeneity among studies, and we cannot evaluate the extent to which this may affect the results.
c.In addition, assuming the included articles are network MAs, they may contain various forms of music interventions, but we only extract the part related to music medicine.Other forms of music interventions are not included.d.The observational indicators mainly include depressive symptoms, effective rate, Hamilton depression scale, self-rating depression scale, and other related indicators for patients with depression.e.The included studies are limited to SRs/MAs, excluding other types of reviews and articles.
Basic characteristics of the included studies.